Collocation Lattices and Maximum Entropy Models

نویسنده

  • Andrei Mikheev
چکیده

Maximum entropy framework proved to be expressive and powerful for the statistical language modelling, but it suffers from the computational expensiveness of the model building. The iterative scaling algorithm that is used for the parameter estimation is computationally expensive while the feature selection process requires to estimate parameters of the model for many candidate features many times. In this paper we present a novel approach for building maximum entropy models. Our approach uses a features collocation lattice and selects the atomic features without resorting to iterative scaling. After the atomic features have been selected we, using the iterative scaling, compile a fully saturated model for the maximal constraint space and then start to eliminate the most specific constraints. Since during constraint deselection at every point we have a fully fit maximum entropy model, we rank the constraints on the basis of their weights in the model. Therefore we don't have to use the iterative scaling during constraint ranldng and apply it only for linear model regression. Another important improvement is that since the simplified model deviates from the previous larger model only in a small number of constraints, we use the parameters of the old model as the initial values of the parameters for the iterative scaling of the new one. This proved to decrease the number of required iterations by about tenfold. As practical results we discuss how our method has been applied to several tasks of language modelling such as sentence boundary disambiguation, part-of-speech tagging and automatic document abstracting.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Lattices for Maximum Entropy Modelling

Maximum entropy framework proved to be expressive and powerful for the statistical language modelling, but it suffers from the computational expensiveness of the model building. The iterative scaling algorithm that is used for the parameter estimation is computationally expensive while the feature selection process might require to estimate parameters for many candidate features many times. In ...

متن کامل

{32 () Feature Lattices and Maximum Entropy Models Ref:mach1379-rm Editor: Ray Mooney

The maximum entropy framework has proved to be expressive and powerful for statistical language modelling, but it suuers from the computational expensiveness of model building. The iterative scaling algorithm that is used for parameter estimation is rather slow while the feature selection process might require parameters for many candidate features to be estimated many times. In this paper we p...

متن کامل

A Note on the Bivariate Maximum Entropy Modeling

Let X=(X1 ,X2 ) be a continuous random vector. Under the assumption that the marginal distributions of X1 and X2 are given, we develop models for vector X when there is partial information about the dependence structure between X1  and X2. The models which are obtained based on well-known Principle of Maximum Entropy are called the maximum entropy (ME) mo...

متن کامل

Using a maximum entropy model to build segmentation lattices for MT

Recent work has shown that translating segmentation lattices (lattices that encode alternative ways of breaking the input to an MT system into words), rather than text in any particular segmentation, improves translation quality of languages whose orthography does not mark morpheme boundaries. However, much of this work has relied on multiple segmenters that perform differently on the same inpu...

متن کامل

Evaluation of Dynamical Spectra for Zero-temperature Quantum Monte Carlo Simulations: Hubbard Lat- Tices and Continuous Systems

Dynamical spectra for Hubbard lattices and simple atoms are obtained using ground state projection (zero-temperature) quantum Monte Carlo and the maximum entropy method. For Hubbard lattices we show that results are equivalent to those obtained from maximum entropy deconvolutions of low-temperature grand canonical quantum Monte Carlo data. These calculations are resolution limited and fail to p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997